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Abstract — This is a tale of two linear programming decoders, 
namely channel coding linear programming decoding (CC- 
LPD) and compressed sensing linear programming decoding 
(CS-LPD). So far, they have evolved quite independently. The 
aim of the present paper is to show that there is a tight 
connection between, on the one hand, CS-LPD based on a zero- 
one measurement matrix over the reals and, on the other hand, 
CC-LPD of the binary linear code that is obtained by viewing 
this measurement matrix as a binary parity-check matrix. This 
connection allows one to translate performance guarantees from 
one setup to the other. 

I. Introduction 

Recently there has been substantial interest in the theory 
of recovering sparse approximations of signals that satisfy 
linear measurements. Compressed (or compressive) sensing 
research (see, e.g., [1], [2]) has developed conditions for 
measurement matrices under which (approximately) sparse 
signals can be recovered by solving a linear programming 
relaxation of the original NP-hard combinatorial problem. 
Interestingly, in one of the first papers in this area (cf. [1]), 
Candes and Tao presented a setup they called "decoding by 
linear programming," henceforth called CS-LPD, where the 
sparse signal corresponds to real-valued noise that is added to 
a real-valued signal that is to be recovered in a hypothetical 
communication problem. 

At about the same time, in an independent line of research, 
Feldman, Wainwright, and Karger considered the problem of 
decoding a binary linear code that is used for data commu- 
nication over a binary-input memoryless channel, a problem 
that is also NP-hard in general. In [3], [4], they formulated 
this channel coding problem as an integer linear program, 
along with presenting a linear programming relaxation for 
it, henceforth called CC-LPD. Several theoretical results 
were subsequently proven about the efficiency of CC-LPD, 
in particular for low-density parity-check (LDPC) codes 
(e.g. [5], [6], [7], [8]). 

As we will see in the subsequent sections, CS-LPD 
and CC-LPD (and the setups they are derived from) are 
formally very similar, however, it is rather unclear if there 
is a connection beyond this formal relationship. In fact 
Candes and Tao in their original paper asked the following 
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question [1, Section VIA]: "...In summary, there does not 
seem to be any explicit known connection with this line of 
wor$\ but it would perhaps be of future interest to explore 
if there is one." 

In this paper we present such a connection between CS- 
LPD and CC-LPD. The general form of our results is that 
if a given binary parity-check matrix is "good" for CC-LPD 
then the same matrix (considered over the reals) is a "good" 
measurement matrix for CS-LPD. The notion of a "good" 
parity-check matrix depends on which channel we use (and 
a corresponding channel-dependent quantity called pseudo- 
weight). 

« Based on results for the binary symmetric channel 
(BSC), we show that if a parity-check matrix can correct 
any k bit-flipping errors under CC-LPD, then the same 
matrix taken as a measurement matrix over the reals 
can be used to recover all /c-sparse error signals under 
CS-LPD. 

• Based on results for binary-input output-symmetric 
channels with bounded log-likelihood ratios, we can 
extend the previous result to show that performance 
guarantees for CC-LPD for such channels can be trans- 
lated into robust sparse-recovery guarantees in the £\ j£\ 
sense (see, e.g., [9]) for CS-LPD. 

• Performance guarantees for CC-LPD for the binary- 
input AWGNC (additive white Gaussian noise channel) 
can be translated into robust sparse-recovery guarantees 
in the £ 2 /ii sense for CS-LPD 

• Max-fractional weight performance guarantees for CC- 
LPD can be translated into robust sparse-recovery guar- 
antees in the ioojlx sense for CS-LPD. 

• Performance guarantees for CC-LPD for the BEC (bi- 
nary erasure channel) can be translated into performance 
guarantees for the compressed sensing setup where the 
support of the error signal is known and the decoder 
tries to recover the sparse signal (i.e., tries to solve the 
linear equations) by back-substitution only. 

All our results are also valid in a stronger, point-wise sense. 
For example, for the BSC, if a parity-check matrix can 
recover a given set of k bit flips under CC-LPD, the same 
matrix will recover any sparse signal supported on those k 
coordinates under CS-LPD. In general, "good" performance 
of CC-LPD on a given error support will yield "good" 
CS-LPD recovery for sparse signals supported on the same 
support. 

It should be noted that all our results are only one-way: we 

'Candes and Tao [1, Section VI. A] refer here to [3], [4]. 



do not prove that a "good" zero-one measurement matrix will 
always be a "good" parity-check matrix for a binary code. 
This remains an interesting open problem. 

The remainder of this paper is organized as follows. In 
Section we set up the notation that will be used. Then in 
Sections [HI] and [TV] we will review the compressed sensing 
and channel coding setups that we are interested in, along 
with their respective linear programming relaxations. This 
review will be presented in such a way that the close 
formal relationship between the two setups will stand out. 
Afterwards, in Section [V] we will show that for a zero-one 
matrix, once seen as a real-valued measurement matrix, once 
seen as a binary parity-check matrix, this close relationship is 
not only formal but that in fact non-zero vectors in the real 
nullspace of this matrix (i.e., vectors that are problematic 
vectors for CS-LPD) can be mapped to non-zero vectors in 
the fundamental cone defined by that same matrix (i.e., to 
vectors that are problematic vectors for CC-LPD). Based 
on this observation one can, as will be shown in Section [VI] 
translate performance guarantees from one setup to the other. 
The paper finishes with some conclusions in Section [vTil 

II. Basic Notation 

Let Z, Z >0 , Z >0 , R, R> , R> , and F 2 be the ring of 
integers, the set of non-negative integers, the set of positive 
integers, the field of real numbers, the set of non-negative real 
numbers, the set of positive real numbers, and the finite field 
of size 2, respectively. Unless noted otherwise, expressions, 
equalities, and inequalities will be over the field R. The 
absolute value of a real number a will be denoted by \a\. 
The size of a set S will be denoted by #<S. 

In this paper all vectors will be column vectors. If a is 
some vector with integer entries, then a (mod 2) will denote 
an equally long vector whose entries are reduced modulo 2. 
If S is a subset of the set of coordinate indices of a vector 
a then a$ is the vector of length #S that contains only 
the coordinates of a whose coordinate index appears in S. 
Moreover, if a is a real vector then we define \a\ to be the 
real vector a' of the same length as a with entries a\ = \di\ 
for all i. Finally, the inner product (a, b) of two equally long 
vectors a and b is defined to (a, b) = £\ a.jbj. 

We define supp(a) = {i | a; ^ 0} to be the support set 
of some vector a. Moreover, we let = {a e R" | 

#supp(a) < k} and = {a e F£ | #supp(a) < k) 
be the set of vectors in R n and FJ, respectively, which have 
at most k non-zero components. If k -C n then vectors in 
these sets are called fc-sparse vectors. 

For any real vector a, we define ||a||o to be the £q 
norm of a, i.e., the number of non-zero components of a. 
Note that ||a||o = «jh(o) = |supp(a)|, where wn(a) is 
the Hamming weight of a. Furthermore, ||a||i = X^i l°i|> 
\\ a h — x/STi M 2 > ll a lloo — maxj \cn\ will denote, respec- 
tively, the £\, £2, and £00 norm of a. 

For a matrix M over R with n columns we define its 
R nullspace to be the set nullspace R (i?) = {a 6 R™ | 
M ■ a = 0} and for a matrix M over F2 with n columns 



we define its F2 nullspace to be the set nullspace F2 ( H ) = 
{a e F 2 l I M • a = (in F 2 )}. 

Let H = (hj t i)j_i be some matrix. We define the sets 
J{H) and 1(H) to be, respectively, the set of row and col- 
umn indices of H. Moreover, we will use the sets Ji(H) = 
{j €J \ h jt i ^ 0} and Ij(H) = {i e 1 | h jti ^ 0}. In the 
following, when no confusion can arise, we will sometimes 
omit the argument H in the preceding expressions. For any 
set iS C X, we will denote its complement with respect to X 
by S, i.e., S =X\S. 

III. Compressed Sensing 
Linear Programming Decoding 

A. The Setup 

Let Hqs be a real matrix of size m x n, called the 
measurement matrix, and let s be a real vector of length 
to. In its simplest form, the compressed sensing problem 
consists of finding the sparsest real vector e' of length n 
that satisfies Hqs ■ e' = s, namely 



CS-OPT : minimize ||e'||o 

subject to Hcs ■ e' = s. 



Assuming that there exists a truly sparse signal e that satisfies 
the measurement Hcs ■ e = s, CS-OPT yields, for suitable 
matrices Hqs, an estimate e that equals e. 

This problem can also be interpreted [1] as part of the 
decoding problem that appears in a coded data communi- 
cating setup where the channel input alphabet is Acs = R, 
the channel output alphabet is 3^cs — R> an d the information 
symbols are encoded with the help of a real-valued code Ccs 
of length n and dimension k = n — rankR(Hcs) as follows. 

• The code is Ccs — { x G R" | Hqs ■ sc = 0}. Because 
of this, the measurement matrix iics is sometimes also 
called an annihilator matrix. 

. A matrix G C s € R" XK for which C C s = {G C s ■ u \ 
u G R K } holds, is called a generator matrix for the 
code Ccs- With the help of such a matrix, information 
vectors u £ R K are encoded into codewords x 6 M" 
according to x = Gqs ' u - 

« Let y £ y^ s be the received vector. We can write y = 
x + e for a suitably defined vector e G R", which will 
be called the error vector. We assume that the channel 
is such that e is sparse or approximately sparse. 

> The receiver first computes the syndrome vector s 
according to s = Hcs • V- Note that 

s = Hqs ■ (x + e) = H C s • x + H C s ■ e 
= H cs ■ e. 

In a second step, the receiver solves CS-OPT to obtain 
an estimate e for e, which can be used to obtain the 
codeword estimate x = y — e, which in turn can be 
used to obtain the information word estimate u. 



Because the complexity of solving CS-OPT is usually 
exponential in the relevant parameters, one can try to for- 
mulate and solve a related optimization problem with the 
aim that the related optimization problem yields very often 
the same solution as CS-OPT, or at least very often a very 
good approximation to the solution given by CS-OPT. In 
the context of CS-OPT, a popular approach is to formulate 
and solve the following related optimization problem (which, 
with the suitable introduction of auxiliary variables, can be 
turned into a linear program): 



CS-LPD : minimize ||e'||i 

subject to Hqs ■ e' = s. 



B. Conditions for the Equivalence of CS-LPD and CS-OPT 

A central question of compressed sensing theory is under 
what conditions the solution given by CS-LPD equals (or 
is very close to) the solution given by CS-OPT0 Clearly, if 
m ^ n and the matrix -Hcs has rank n, there is only one 
feasible e' and the two problems have the same solution. 

In this paper we typically focus on the linear sparsity 
regime, i.e., k = 9(n) and m = 0(n), but our techniques 
are more generally applicable. The question is for which 
measurement matrices (hopefully with a small number of 
measurements m) the LP relaxation is tight, i.e., the estimate 
given by CS-LPD equals the estimate given by CS-OPT. 
One such sufficient condition is that a given measurement 
matrix is "good" if it satisfies the restricted isometry property 
(RIP), i.e., does not distort the £2 length of all fc-sparse 
vectors. If this is the case then it was shown [1] that the 
LP relaxation will be tight for all fc-sparse vectors e and 
further the recovery will be robust to approximate sparsity. 
The RIP condition however is not a complete characterization 
of "good" measurement matrices. We will use the nullspace 
characterization (see, e.g., [10], [11]) instead, that is neces- 
sary and sufficient. 

Definition 1 Let S C X(Hcs) an d let C £ K^o We say 

that Hqs has the nullspace property NSPg (S, C), and write 
Hcs eNSP|(<S,C), if 

C ■ ||f<s||i ^ ll^lli f or °M v £ nullspace R (ifcs)- 

We say that Hcs has the strict nullspace property 
NSP^(S,C), and write H cs € NSP^(5,C), if 

C ■ \WsWi < H^slli for all v £ nullspacc R (i?cs) \ {0}- 

□ 



2 It is important to note that we worry only about the solution given by 
CS-LPD being equal (or very close to) the solution given by CS-OPT, 
because even CS-OPT might fail to correctly estimate the error vector in 
the above communication setup when the error vector has too many large 
components. 



Definition 2 Let k £ Z^o and let C £ M^o- We say that 
Hqs has the nullspace property NSPg(fc,C), and write 

ff s€NSPf(*,C), if 

H CS G NSPf (S, C) for all S C 1(H CS ) with #5 sC fc. 

We say that Hcs has the strict nullspace property 
NSPJj(fc,C), and write H CS e NSP^(fc,C*), if 

H CS G NSP^(S, C) for all S C 1(H CS ) with #5 s$ fc. 

□ 

As was shown independently by several authors (see 
[12], [13], [14], [11] and references therein) the nullspace 
condition in Definition [2] is a necessary and sufficient con- 
dition for a measurement matrix to be "good" for fc-sparse 
signals, i.e. that the estimate given by CS-LPD equals the 
estimate given by CS-OPT for these matrices. The nullspace 
characterization of "good" measurement matrices will be one 
of the keys to linking CS-LPD with CC-LPD. Observe that 
the requirement is that vectors in the nullspace of Hcs have 
their £\ mass spread in substantially more than fc coordinates. 
The following theorem is adapted from [11] (and references 
therein). 

Theorem 3 Let Hcs he a measurement matrix. Further, 
assume that s = Hcs ■ e an d that e has at most k nonzero 
elements, i.e., ||e||o ^5 k. Then the estimate e produced by 
CS-LPD will equal the estimate e produced by CS-OPT if 
tfcs€NSP<(fc,C=l). 

Remark: Actually, as discussed in [11] and references 
therein, the condition Hcs £ NSP R (fc, C = 1) is also 
necessary, but we will not use this here. 

The next performance metric (see, e.g., [9], [15]) for CS 
involves recovering sparse approximations to signals that are 
not exactly fc-sparse. 

Definition 4 An £ p /£ q approximation guarantee for CS- 
LPD means that the CS-LPD outputs an estimate e that is 
within a factor C p ^ q (k) from the best k-sparse approximation 
for e, i.e., 

||e- e\\ p < C p q (k) ■ min ||e-e'|| 9 , (1) 

where the left-hand side is measured in the £ p norm and the 
right-hand side is measured in the £ q norm. □ 

Note that the minimizer of the right-hand side of (HJ (for 
any norm) is the vector e £ SJj„ that has the fc largest 
(in magnitude) coordinates of e, also called the best fc-term 
approximation of e [15]. Therefore the right-hand side of (Q~|) 
equals C p>q (k) ■ ||e^r|| g where S* is the support set of the fc 
largest (in magnitude) components of e. Also note that if e 
is exactly fc-sparse the above condition suggests that e = e 
since the right hand-side of (Q~|) vanishes, therefore it is a 
strictly stronger statement than recovery of sparse signals. 
(Of course, such a stronger approximation guarantee for e 
is usually only obtained under stronger assumptions on the 
measurement matrix.) 



The nullspace condition is necessary and sufficient for 
approximation for any measurement matrix. This is 
shown in the next theorem and proof which are adapted 
from [10, Theorem 1]. (Actually, we omit the necessity part 
in the next theorem since it will not be needed in this paper.) 

Theorem 5 Let Hcs be a measurement matrix and choose 
some constant C > 1. Further, assume that s = Hcs ' e - 
Then for any set 5 CI with #S ^ k the solution e produced 
by CS-LPD will satisfy 

lie - elk < 2 • C 



C-l 



if H cs GNSPf (fc,C). 

Proof: Suppose that Hcs has the claimed nullspace 
property. Since Hcs ■ e = s and Hcs • e = s, it easily 
follows that t> = e — e is in the nullspace of Hcs- So, 



e s 1 



l|e||i 

(a) 

> Pill 

= ll e + Hli 

= ||es + fs\\i 

(b) 

^ ll e <s||i 

(c) 

> es i 



ll^slli 

C-l 



■ + I/ slli 

ll^lli - ll e slli 
"Hi - lleslli, (2) 



C7+1 

where step (a) follows from the fact that the solution to CS- 
LPD satisfies ||e||i ^ ||e||i, where step (b) follows from 
applying the triangle inequality for the l\ norm twice, and 
where step (c) follows from 



"si 



'slU 



(i) C-l 
> 

" C + l 



Here, step (d) is a consequence of 



C- 



(C+l)- + 111^11!) 

= -C • ||i/s||i - H^slli 

(e) 

> -lleslli - lleslli + C ■ + C ■ HksIIi 

= (C-l) ■\\vs\\i + (C-l) -H^ll! 
= {C-1)-\W\\ U 

where step (e) follows from applying twice the fact that 
v G nullspacc K (ffcs) an d the assumption that Hcs G 
NSPg(fc, C). Subtracting the term ||es||i on both sides 
of©, and solving for = ||e — e||i yields the promised 
result. ■ 

IV. Channel Coding 
Linear Programming Decoding 

A. The Setup 

We consider coded data transmission over a memoryless 
channel with input alphabet Xqc — {0, 1}, output alphabet 
ycc, an d channel law Py\x(v\ x ) with the help of a binary 
linear code Ccc of length n and dimension k with n ^ k. 
In the following, we will identify Xcc with Fa. 



• Let Gcc G Fj K be a generator matrix for Ccc- 
Consequently, Gcc has rank k over F2, and information 
vectors u G F£ are encoded into codewords x G F 2 l 
according to x = Gcc • u ( m F2), i.e.,. Ccc = 
{Gcc ""(in F 2 )|ttGF5}0 

« Let H C c € F™*" be a parity-check matrix for Ccc- 
Consequently, Hcc has rank n — K ^ m over F2, and 
any ieFJ satisfies ifcc ■ x = (in F2) if and only if 
x G Ccc, i.e., Ccc - G F£ | i*cc ■ x = (in F 2 )}. 

• Let y G ^cc De the received vector and define for each 
i G T(Hcc) the log-likelihood ratio Aj = Aj(yj) = 

• On the side, let us remark that if J^cc is binary then 
y>cc can be identified with F2 and we can write y = 
x + e (in F2) for a suitably defined vector e G F 2 \ 
which will be called the error vector. Moreover, we can 
define the syndrome vector s = Hcc • V (in F2). Note 
that 

s = Hcc ■ {x + e) = H C c ■ x + H C c ■ e 
= Hcc ■ e (in F 2 ). 

However, in the following we will only use the log- 
likelihood ratio vector A (that can be defined for any 
alphabet 3fcc)> an d n °t the binary syndrome vector s. 
Upon observing Y = y, the maximum-likelihood decoding 
(MLD) rule decides for x(y) = argmax x / e c C c Py\x(v\ x ') 
where P Y \x(y\x') = Hiex Py|x(?/*K)B Formally: 



CC-MLD1 



maximize 
subject to 



Py\x(y\x') 
x' G C cc - 



It is clear that instead of Py\x{v\x') we can also maxi- 
mize \ogP Y \x{y\x') = Y. l ex lo ^ p y\x{Vt\ x 't)- Noting that 
logiV|x(»iK) = -AisJ + logPy| X (j/i|0) for x' t G {0, 1}, 
CC-MLD1 can then be rewritten to read 



CC-MLD2 : minimize (A, x') 

subject to x' G Ccc- 



Because the cost function is linear, and a linear function 
attains its minimum at the extremal points of a convex set, 
this is essentially equivalent to 



CC-MLD3 : minimize (A, x') 

subject to x' G conv(Ccc)- 



Although this is a linear program, it can usually not be solved 
efficiently because its description complexity is typically 

3 We remind the reader that throughout this paper we are using column 
vectors, which is in contrast to the coding theory habit to use row vectors. 

4 Actually, slightly more precise would be to call this decision rule "block- 
wise maximum-likelihood decoding." 



exponential in the block length of the code[f| 

However, one might try to solve a relaxation of CC- 
MLD3. Namely, as proposed by Feldman, Wainwright, and 
Karger [3], [4], we can try to solve the optimization problem 



CC-LPD : minimize (A, x') 

subject to x' G V(Hcc), 



where the relaxed set V(Hcc) 3 conv(C) is given in the 
next definition. 

Definition 6 For every j G J(Hcc), let hj be the j-th row 
of H CC and let C C c,j = {x e W% \ (hj,x) = (mod 2)}. 
Then, the fundamental polytope V = V(Hcc) of Hcc is 
defined to be the set 

V 4 V(Hcc) = f) conv(C CCj ). 

Vectors in V(Hqc) will be called pseudo-codewords. □ 

In order to motivate this relaxation, note that the code C 
can be written as 

Ccc = Ccc,i n • • ■ n Ccc.mj 

and so 

conv(Ccc) = conv(Ccc,i fl ■ • • H Ccc,™) 

C conv(C C c,i) n • • • n conv(Ccc,m) 
= V(H CC ). 

It can be verified [3], [4] that this relaxation possesses the 
important property that all the vertices of conv(Ccc) are a lso 
vertices of V(Hcc)- Let us emphasize that different parity- 
check matrices for the same code usually lead to different 
fundamental polytopes and therefore to different CC-LPDs. 

Similarly to the compressed sensing setup, we want to 
understand when we can guarantee that the codeword esti- 
mate given by CC-LPD equals the codeword estimate given 
by CC-MLD. It is important to note, as we did in the 
compressed sensing setup, that we worry mostly about the 
solution given by CC-LPD being equal to the solution given 
by CC-MLD, because even CC-MLD might fail to correctly 
identify the codeword that was sent when the error vector is 
beyond the error correction capability of the code. Therefore, 
the performance of CC-MLD is a natural upper bound on 
the performance of CC-LPD, and a way to assess CC-LPD 
is to study the gap to CC-MLD, e.g., by comparing the 
performance guarantees for CC-LPD that are discussed here 
with known performance guarantees for CC-MLD. 

When characterizing the CC-LPD performance of bi- 
nary linear codes over binary-input output-symmetric chan- 
nels [17] we can without loss of generality assume that the 

5 Examples of code families that have sub-exponential description com- 
plexities in the block length are convolutional codes (with fixed state-space 
size), cycle codes, and tree codes. However, these classes of codes are not 
good enough for achieving performance close to capacity even under ML 
decoding. (For more on this topic, see for example [16].) 



all-zero codeword was transmitted. With this, the success 
probability of CC-LPD is the probability that the all-zero 
codeword yields the lowest cost function value compared to 
all non-zero vectors in the fundamental polytope. Because the 
cost function is linear, this is equivalent to the statement that 
the success probability of CC-LPD equals the probability 
that the all-zero codeword yields the lowest cost function 
value compared to all non-zero vectors in the conic hull 
of the fundamental polytope. This conic hull is called the 
fundamental cone K, = K(Hcc) an d it can be written as 

K = fC(Hcc) = conic {V(H CC )) = f| conic(C CC j)- 

The fundamental cone can be characterized by the inequali- 
ties listed in the following lemma [3], [4], [5], [6]. (Similar 
inequalities can be given for the fundamental polytope but 
we will not need them here.) 

Lemma 7 The fundamental cone K, = IC(Hcc) of Hcc is 
the set of all vectors u) G K™ that satisfy 

u>i ^ (for all iel) , (3) 

w, ^ ^ Oty (for all j G J, for all i G lj) . (4) 

i' GTj \i 

A vector u) G K, is called a pseudo-codeword. If such a vector 
lies on an edge oftC, it is called a minimal pseudo-codeword. 
Moreover, if u) G K, P\ Z n and uj (mod 2) G C, then ui is 
called an unsealed pseudo-codeword. (For a motivation of 
these definitions, see [6], [18]). 

Note that in the following, not only vectors in the funda- 
mental polytope, but also vectors in the fundamental cone 
will be called pseudo-codewords. Moreover, if Hcs is a 
zero-one measurement matrix, i.e., a measurement matrix 
where all entries are in {0, 1}, then we will consider Hcs 
to represent also the parity-check matrix of some linear code 
over F2. Consequently, its fundamental polytope will be 
denoted by 7 5 (i?cs) an d its fundamental cone by /C(iZcs)- 

B. Conditions for the Equivalence of CC-LPD and CC-MLD 

The following lemma states when CC-LPD succeeds for 
the BSC. 

Lemma 8 Let Hcc be the parity-check matrix of some code 
Ccc and let S C X (Hcc) be the set of coordinate indices 
that are flipped by the BSC. If Hcc is such that 

\\u>s\\i < \\<*g\\i ( 5 ) 

for all u) G /C(.Hcc)\{0} then the CC-LPD decision equals 
the codeword that was sent. 

Remark: The above condition is also necessary, how- 
ever, we will not use this fact in the following. 

Proof: Without loss of generality, we can assume that 
the all-zero codeword was transmitted. Let +L > be 
the log-likelihood ratio associated to a received 0, and let 
— L < be the log-likelihood ratio associated to a received 



1. Therefore, Ai = +L if i G S and Aj = — L if i G 5. Then 
it follows from the assumptions in the lemma statement that 
for any w G JC(H CC ) \ {0} 



(A, w) = ^(+ L ) ' ^ + ' w * 



(a) 



I 



"gill 



(b) 

L • w s ||i > 



(A,0) 



where the equality follows from the fact that |a;,| = u>i 
for all i G T(Hcc), an d where the inequality in step (b) 
follows from ([5]). Therefore, under CC-LPD the all-zero 
codeword has the lowest cost function value compared to 
all the non-zero pseudo-codewords in the fundamental cone, 
and therefore also compared to all the non-zero pseudo- 
codewords in the fundamental polytope. ■ 
Note that the inequality in © is identical to the inequality 
that appears in the definition of the strict nullspace property 
for C = 1 (!) This observation makes one wonder if there is a 
connection between CS-LPD and CC-LPD, in particular for 
measurement matrices that contain only zeros and ones. Of 
course, in order to establish such a connection we first need 
to understand how points in the nullspace of the measurement 
matrix Hqs can be associated with points in the fundamental 
polytope of the parity-check matrix Hqs (now seen as a 
parity-check matrix for a code over F2). Such an association 
will be exhibited in Section [V] However, before turning to 
that section, we will first discuss pseudo-weights, which 
are a popular way of characterizing the importance of the 
different pseudo-codewords in the fundamental cone and for 
establishing performance guarantees for CC-LPD. 

C. Definition of Pseudo-Weights 

Note that the fundamental polytope and cone are only 
a function of the parity-check matrix of the code and not 
of the channel. The influence of the channel is reflected 
in the pseudo-weight of the pseudo-codewords, so every 
channel has its pseudo-weight definition. Therefore, every 
communication channel comes with the right measure of 
distance that determines how often a fractional vertex is 
incorrectly chosen in CC-LPD. 

Definition 9 ([19], [20], [3], [4], [5], [6]) Let u> be a non- 
zero vector in 



with u> = (lui, . . . ,uj n ). 



The AWGNC (more precisely, binary-input AWGNC) 
pseudo-weight of u> is defined to be 



w 



AWGNC 



( W ) 



In order to define the BSC pseudo-weight (u), 
we let ui 1 be the vector of length n with the same 
components as ui but in non-increasing order. Now let 

/(f) = <4 (i-Kf <i, 0<£<n), 



p(o= [ ma?, 

Jo 



= F~ 



1 (F{n) 



Mi 



Then the BSC pseudo-weight Wp sc (u)) of u> is defined 
to be w^ sc (lj) 4 2e. 

The BEC pseudo-weight of ui is defined to be 



w 



BEC 



(w) = I supp(o;) 



The max-fractional weight of u) is defined to be 

Vlli 



Wmax-frac(w) = 



\U)\ 



For u) = we define all of the above pseudo-weights and 
the max-fractional weight to be zero. □ 

A detailed discussion of the motivation and significance 
of these definitions can be found in [6]. For a parity-check 
matrix i?cc we define the minimum AWGNC pseudo- 
weight w^ WGNC ' min (i? cc ) to be 



AWGNC, min 



(Hcc) - min 



U eP(H C c)\{0) 



*&IC(Hcc)\{0} 



AWGNC 



AWGNC 



(«). 



The minimum BSC pseudo-weight w® sc > min (H C c), the 
minimum BEC pseudo- weight u£ ' mm (i?cc)> an d 
the minimum max-fractional weight w™ 1 ^, _ frac (-Hcc) 
of ifcc are defined analogously. Note that although 
w™™ _ frac {Hcc ) yields weaker performance guarantees 
than the other quantities [6], it has the advantage of being 
efficiently computable [3], [4]. 

There are other possible definitions of a BSC pseudo- 
weight. For example, the BSC pseudo-weight of us can also 
be taken to be 



BSC 



if llw'r 



{!■ 



1 if llw' ri 



,e}IU = 
ll > 



'{l,...,e}lll > ll W {e+l,...,ri}lll 

where u>' is defined as in Definition [9] and where e is the 
smallest integer such that \\uj' {1 ^ _ >e} ||i > ll w { e+ i,...,„}lli- 
This definition of the BSC pseudo-weight was e.g. used 
in [21]. (Note that in [20] the quantity uip SC ' (w) was 
introduced as "BSC effective weight".) 

Of course, the values (u:) and (u>) are tightly 

connected. Namely, if Wp SC (uj) is an even integer then 



„BSC/ 



(uj), and if Wp SC (u>) is an odd integer 



1 < Wp bc (o;) < w. 



BSC 



(«) + 1. 



then w* sc '(u>)-^ 

The following lemma establishes a connection between 
BSC pseudo-weights and the condition that appears in 
Lemma [8] 



Lemma 10 Let Hcc be the parity-check matrix of some 
code Ccc and let uj be some arbitrary non-zero pseudo- 
codeword of Hcc, <*> G /C(.ffcc)\{0}. Then for all sets 
SCI with #5 < i-w^w), or with #5 < ±-w* SG ' (u;), 
it holds that 

\\«*sh < \\"s\h- 

Proof: First, consider the statement under for the as- 
sumption #S < \ ■ «ip SC (u). The proof is by contradiction. 
So, assume that ||u>s||i ^ ||u^-||i holds. This statement is 



clearly equivalent to the statement that 2 ■ ||u>s||i ^ W^sW 1 + 
= | lj ||i, which is equivalent to the statement that 
W^s ||i ^ \ ■ II ^ ||i- ln terms of the notation in Definition [9] 
this means that 



BSC 



(w) = 2 • F~ 



(b) 

2 • 



&S 1 



^2 



(a) 

< 2 



#5 



F-Hlkslli) 
= 2 • #5, 



where at step (a) we have used the fact that F^ 1 is a (strictly) 
non-decreasing function and where at step (b) we have used 
the fact that the slope of F -1 (over the domain where F^ 1 is 
defined) is at least 1 / 1 1 1 1 ^ . This, however, is a contradiction 
to the assumption that #5 < \ ' Wp SC (u>). 

Secondly, consider the statement under for the assumption 



#5 < 



BSC 



(u>). The proof is by contradiction. So, 
|^5 ||i ^ ll w 5"||i holds. With this, and 



2 ~P 

assume that the 

the above definition of u>' based on u>, W^'^ #5} 111 ^ 

> \\<*s\\i > ll w {#5+i,...,n}lli- If < SC '(^) is an 
even integer then this line of inequalities shows that #<S ^ 
1 .^bsc which is a contradiction to the assumption that 
#5 < \ -Wp SC ' (uj). If u>p SC '(u;) is an odd integer then this 

„BSC' 



to bad performance of CC-LPD. Similarly, a "good" parity- 
check matrix Hcs must have no low pseudo-weight points 
in the fundamental cone, which means that there are no 
problematic points in the real nullspace of Hcs- Therefore 
"positive" results for channel coding will translate into 
"positive" results for compressed sensing, and "negative" 
results for compressed sensing will translate into "negative" 
results for channel coding. 

Further, the lemma preserves the support of a given point 
v. That means that if there are no low pseudo-weight points 
in the fundamental cone of Hcs with a given support, there 
are no problematic points in the real nullspace of Hcs with 
the same support, which allows point-wise versions of all 
our results. 

VI. Translation of Performance Guarantees 

In this section we use the bridge between CS-LPD and 
CC-LPD that was established in the previous section to 
translate "positive" results about CC-LPD to "positive" 
results about CS-LPD. 



1 : ne B ° s f c , ineqUaUtieS shows that * S > 5/ (") + l ) > A. The Role of the BSC Pseudo-Weight for CS-LPD 



which again is a contradiction to the assumption 



that #5 < i 



,,BSC' 



(«)■ 



V. Establishing a Bridge Between 
CS-LPD and CC-LPD 

We are now ready to establish a bridge between CS-LPD 
and CC-LPD. Our main tool is a simple lemma that was 
already established in [22] but for a different purpose. 

Lemma 11 Let Hcs be a measurement matrix that contains 
only zeros and ones. Then 



Lemma 12 Let Hcs G {0, l} mx ™ be a CS measurement 
matrix and let k be a non-negative integer. Then 



< SC ' min (H cs ) > 2k 



Hcs GNSP^(fc,C=l). 



v G nullspace R (-Ffi 



cs) 



\v\ G IC(H 



CS; 



Remark: Note that supp(i/) = supp(|f |). 

Proof: Let u> = In order to show that such a vector 
lj is indeed in the fundamental cone of i?cs> we need to 
verify (0 and (0). The way ui is defined, it is clear that 
it satisfies (|3j. Therefore, let us focus on the proof that oj 
satisfies ©. Namely, from v G nullspace R (ifcs) it follows 
that for all j G J, Yliex = 0, i.e., for all j G J, 



0. This implies 



U)i = \Vi\ = 



E 1 



i'eXj\j 



for all j G J and all i G X,, showing that u> indeed 
satisfies (0). ■ 
This lemma is fundamentally one-way: it says that with 
every point in the real nullspace of the measurement matrix 
fics we can associate a point in the fundamental cone of 
Hcs, but not necessarily vice- versa. Therefore a problematic 
point for the real nullspace of Hcs will translate to a 
problematic point in the fundamental cone of Hcs an d hence 



Proof: Fix some v G nullspacc K (Hcs) \ {0}. By 
Lemma [TTI we know that \v\ is a pseudo-codeword of Hcs, 
and by the assumption uip SC ' mm (Hcs) > 2fc we know 
that Wp SC (|i/|) > 2k. Then, using Lemma [TOl we conclude 
that for all sets S C X with #5 ^ k, we must have 
\\"s\\i = \\Ws\ Hi < 111*^1 Hi = ll^lli- Because v was 
arbitrary, the claim Hcs € NSPjJ (k, C~l) clearly follows. 

■ 

Recent results on the performance analysis of CC-LPD 
showed that parity-check matrices constructed from expander 
graphs can correct a constant fraction (of the block length 
n) of worst case [23] and random [8], [24] errors. (These 
types of results are analogous to the so-called strong and 
weak bounds for compressed sensing, respectively.) 

These worst case error performance guarantees implicitly 
show that the BSC pseudo-weight of all pseudo-codewords 
of a binary linear code defined by a Tanner with sufficient 
expansion (strictly larger than 3/4) must grow linearly in 
n. (A conclusion in a similar direction can be drawn for 
the random error setup.) We can therefore use our results 
to obtain new performance guarantees for CS-LPD based 
sparse recovery problems. 

Let us mention that in [9], [25] expansion arguments 
were used to directly obtain similar types of performance 
guarantees for compressed sensing; the comparison of these 
guarantees to the guarantees that can be obtained through our 
channel-coding-based arguments remains as future work. 



B. The Role of Binary-Input Channels Beyond the BSC for 
CS-LPD 



Moreover, step (c) follows from 



In Lemma [121 we made a connection between performance 
guarantees for the BSC under CC-LPD on the one hand and 
the strict nullspace property NSP R (fc,C) for C = 1 on the 
other hand. In this subsection we want to mention that one 
can establish a connection between performance guarantees 
for a certain class of binary-input channels under CS-LPD 
and the strict nullspace property NSP R (fc,C) for C > 1. 
This class of channels consists of binary-input memoryless 
channels where for all output symbols the magnitude of the 
log-likelihood ratio is bounded by some constant W S K>o- 
Without going into the details, the results from [26] (which 
generalize results from [23]) can be used to establish this 
connection]^ 

The results of this section will be discussed in more detail 
in a longer version of the present paper. 

C. Connection between AWGNC Pseudo-Weight and £2/^1 
Guarantees 

Theorem 13 Let Hcs & {0, l} mx " be a measurement 
matrix and let s and e be such that s = Hqs ■ e. Moreover, 
let S C X{Hqs) with #5 = k, and let C be an arbitrary 
positive real number with C > 4k. Then the estimate e 
produced by CS-LPD will satisfy 



"5 1 



Vk 



with 



C" = 



1 



(/"-^p (1^1) > C holds for all u £ nullspace R (Hcs) \ 

{0}. (In particular, this latter condition is satisfied for a 
measurement matrix H C s with w£ WGNC > min (i2cs) > C.) 



= IMIi - 2 ll"slli 

(d) . 

^ \/C 7 || I /|| 2 -2||i/ s || 1 

< $VC J \\v\\ 2 -2Vk\\i;sh 
(f) 



>y/C , '\\v\\2-2Vk\\v\\ 2 
= (VC J -2Vk) \\u\\ 2 , 



where step 



^AWGNC^ 



(d) follows from the assumption that 
> C for all v G nullspace R (ff C s) \ {0}, 
i.e., \v\\ > ^/C 7 ■ \\u\\2 for all v e nullspace R (ifcs), 
where step (e) follows from the inequality j|a||i ^ \fk- ||a||2 
that holds for any real vector a of length k, and where 
step (f) follows the inequality || II 2 ^ II a II 2 that holds for 
any real vector a whose set of coordinate indices includes 
S. Subtracting the term ||es||i on both sides of ©, and 
solving for ||i/|| 2 = ||e — e|| 2 yields the promised result. ■ 



D. Connection between Max-Fractional Weight and l^jlx 
Guarantees 



' l be a measurement 
Hcs ' e - Moreover, 



Theorem 14 Let H cs e {0, 1} 

matrix and let s and e be such that s 
let S C I (Hcs) with #S = k, and let C be an arbitrary 
positive real number with C > 2k. Then the estimate e 
produced by CS-LPD will satisfy 



e — e 



C" 



|ec||i 



with 



C" = 



1 



2k L 



Proof: By definition, e is the original signal. Since 
Hcs e — s and Hcs ■ e = s, it easily follows that v = e— e 
is in the nullspace of Hcs- So, 



if Wmax-fracOD^C holds for all v £ nullspace R ( J ff C s) \ 
{0}. (In particular, this latter condition is satisfied for a 
measurement matrix Hcs with i«™_ [rac (Hcs) C .) 



l e 5lll = H e lll 
(a) 

>l|e||i 
= ||e + i/| 

= ll e 5 
(b) 

> VsW 

(c) 

> ll e 5||r 



1 

"slli + II 
- Il^slli 



'slli 



/ C 7 -2Vfc)||i^ 



2 slU 

lleolli, (6) 



where step (a) follows from the fact that the solution to 
CS-LPD satisfies ||e||i ^ ||e||i and where step (b) follows 
from applying the triangle inequality for the l\ norm twice. 



Note that in [26], "This suggests that the asymptotic advantage over [. . . ] 
is gained not by quantization, but rather by restricting the LLRs to have 
finite support." should read "This suggests that the asymptotic advantage 
over [. . . ] is gained not by quantization, but rather by restricting the LLRs 
to have bounded support." 



Proof: By definition, e is the original signal. Since 
-Hcs -e = s and Hcs ■ e = s, it easily follows that v = e— e 
is in the nullspace of Hcs- So, 



l e 7;lli = ll e lli 



(a) 



(b) 
(c) 



e||i 

e + f||i 

es + Kslli + He^ + fgW 1 

es||i - ||^5 111 + ||"sl|i - ll e slli 

es||i + (C"-2A).||i/|| 00 -||e ff ||i J (7) 



where step (a) follows from the fact that the solution to 
CS-LPD satisfies ||e||i ^ ||e||i and where step (b) follows 
from applying the triangle inequality for the t\ norm twice. 



Moreover, step (c) follows from 

-IMi + IMi = IH|i-2-||i/ s ||i 

(d) 

> C-iHioo-2.H1/5H1 
%c ■ |H|oo-2fc- Halloo 

(f) . — 

^ Vc^ ■ M\oo - 2k ■ m x 

= (C-2A!).|H|oo, 

where step (d) follows from the assumption that 
w m ax-frac(M) > C for all v 6 nullspace K (i2'cs) \ {0}, 
i.e., |H|i ^ C" • ||i/||oo for a H ^ € nullspace R (i?cs)> where 
step (e) follows from the inequality ||a||i ^ k ■ ||a||co that 
holds for any real vector a of length k, and where step (f) 
follows the inequality HasH^ ^ Halloo that holds for any 
real vector a whose set of coordinate indices includes S. 
Subtracting the term ||e,s||i on both sides of (0, and solving 
for 1 1 1/ ||oc ■ = || e — ej|co yields the promised result. ■ 

E. Connection between BEC Pseudo-Weight and CS-LPD 

For the binary erasure channel, CC-LPD is identical to the 
peeling decoder [17] that is just solving a system of linear 
equations by only using back-substitution. We can define 
an analogous compressed sensing problem by assuming that 
the compressed sensing decoder is given the support of the 
sparse signal e and decoding simply involves trying to re- 
cover the values of the non-zero entries by back-substitution, 
similarly to iterative matching pursuit. In this case it is clear 
that CC-LPD for the BEC and the described compressed 
sensing decoder have identical performance since back- 
substitution behaves exactly the same way over any field, 
be it the field of real numbers or any finite field. (Note that 
whereas the result of the CC-LPD for the BEC equals the 
result of the back-substitution-based decoder for the BEC, 
the same is not true for compressed sensing, i.e., CS-LPD 
with given support of the sparse signal can be strictly better 
than the back-substitution-based decoder with given support 
of the sparse signal.) 

VII. Conclusions and Future work 

Based on the observation that points in the nullspace of a 
zero-one matrix (considered as a real measurement matrix) 
can be mapped to points in the fundamental cone of the same 
matrix (considered as the parity-check matrix of a code over 
F2), we were able to establish a connection between CS- 
LPD and CC-LPD. 

In addition to CS-LPD, a number of combinatorial algo- 
rithms (e.g. [27], [25], [28], [9], [29]) have been proposed 
for compressed sensing problems, with the benefit of faster 
decoding complexity and comparable performance to CS- 
LPD. It would be interesting to investigate if the connection 
of sparse recovery problems to channel coding extends in 
a similar manner for these decoders. One example of such 
a clear connection is the bit-flipping algorithm of Sipser 
and Spielman [30] and the corresponding algorithm for 
compressed sensing by Xu and Hassibi [25]. Connections of 



message-passing decoders for compressed sensing problems 
were also recently discussed in [31]. 

Other interesting directions involve using optimized chan- 
nel coding matrices with randomized or deterministic con- 
structions (e.g., see [17]) to create measurement matrices. 
Another is using ideas for improving the performance of 
a given measurement matrix (for example by removing 
short cycles), with possible theoretical guarantees. Finally, 
one interesting question relates to being able to certify in 
polynomial time that a given measurement matrix has good 
performance. 

In any case, we hope that the connection between CS- 
LPD and CC-LPD that was discussed in this paper will help 
deepen the understanding of the role of linear programming 
relaxations for sparse recovery and for channel coding, in 
particular by translating results from one field to the other. 
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